String matching with alphabet sampling

نویسندگان

  • Francisco Claude
  • Gonzalo Navarro
  • Hannu Peltola
  • Leena Salmela
  • Jorma Tarhio
چکیده

We introduce a novel alphabet sampling technique for speeding up both online and indexed string matching. We choose a subset of the alphabet and extract the corresponding subsequence of the text. Online or indexed searching is then carried out on the extracted subsequence, and candidate matches are verified in the full text. We show that this speeds up online searching, especially for moderate to long patterns, by a factor of up to 5, while using 14% extra space in our experiments. For indexed searching we achieve indexes that are as fast as the classical suffix array, yet occupy less than 50% extra space (instead of the usual 400%). Our experiments show no competitive alternatives exist in a wide space/time range.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate String Matching with Reduced Alphabet

We present a method to speed up approximate string matching by mapping the factual alphabet to a smaller alphabet. We apply the alphabet reduction scheme to a tuned version of the approximate Boyer– Moore algorithm utilizing the Four-Russians technique. Our experiments show that the alphabet reduction makes the algorithm faster. Especially in the k-mismatch case, the new variation is faster tha...

متن کامل

Constant-Space String Matching with Smaller Number of Comparisons: Sequential Sampling

A new string-matching algorithm working in constant space and linear time is presented. It is based on a powerful idea of sampling, originally introduced in parallel computations. The algorithm uses a sample S which consists of two positions inside the pattern P. First the positions of the sample S are tested against the corresponding positions of the text T, then a version of Knuth-Morris-Prat...

متن کامل

An Efficient Composite-Alphabet Transform for String Matching under a Restricted Alphabet Set

String matching is a problem of finding all occurrences of a short pattern on a relatively long reference string. While a number of methods have been presented, most published implementations assume several restrictions due to some practical issues. We focus on the restriction of the alphabet size, which is usually set to be 256 in many string matching libraries. When strings must be handled ov...

متن کامل

Two Dimensional Matching 11

There are many solutions to the string matching problem which are strictly linear in the input size and independent of alphabet size. Furthermore, the model of computation for these algorithms is very weak: they allow only simple arithmetic and comparisons of equality between characters of the input. In contrast, algorithms for two dimensional matching have needed stronger models of computation...

متن کامل

Experimental on results on string matching over in nitealphabetsThierry

Various string matching algorithms have been designed and some experimental works on string matching over nite alphabets have been performed but string matching over innnite alphabets has been little investigated. We present here experimental results where symbols are taken among potentially innnite sets such as integers, reals or composed structures. These results show that in most cases it is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Discrete Algorithms

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2012